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Abstract.  A recent  method  combines  model  checkers  with  specifica- 
tion-based mutation  analysis  to  generate  test  cases  from  formal  software 
specifications.  However  high-level  software  specifications  usually  must  be 
reduced  to  make  analysis  with  a model  checker  feasible. 

We  propose  a new  reduction,  parts  of  which  can  be  applied  mechanically, 
to  soundly  reduce  some  large,  even  infinite,  state  machines  to  manageable 
pieces.  Our  work  differs  from  other  work  in  that  we  use  the  reduction 
for  generating  test  sets,  as  opposed  to  the  typical  goal  of  analyzing  for 
properties.  Consequently,  we  have  different  criteria,  and  we  prove  a dif- 
ferent soundness  rule.  Informally,  the  rule  is  that  counterexamples  from 
the  model  checker  are  test  cases  for  the  original  specification.  The  reduc- 
tion changes  both  the  state  machine  and  temporal  logic  constraints  in 
the  model  checking  specification  to  avoid  generating  unsound  test  cases. 
We  give  an  example  of  the  reduction  and  test  generation. 


1 Introduction 

The  use  of  formal  methods  has  been  widely  advocated  to  reduce  the  likelihood 
of  errors  in  early  stages  of  system  development.  Some  of  the  chief  drawbacks  to 
applying  formal  methods  are  the  difficulty  of  conducting  formal  analysis  [6]  and 
the  perceived  or  actual  payoff  in  project  budget.  Testing  is  a substantial  part  of 
the  software  budget,  and  formal  methods  offer  an  opportunity  to  significantly 
reduce  testing  costs,  thereby  making  formal  methods  more  attractive  from  the 
budget  perspective. 

The  authors  developed  an  innovative  combination  of  mutation  analysis,  sym- 
bolic model  checking,  and  test  generation  which  solves  some  problems  previously 
plaguing  these  approaches  and  automatically  produces  good  sets  of  tests  from 
formal  specifications  [1,2].  This  approach  is  useful  only  if  there  is  a specification 
amenable  to  model  checking  for  a given  application.  Here  we  seek  to  widen  the 
class  of  applications  for  which  automatic  test  generation  via  a model  checker  is 
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feasible.  Our  approach  is  to  define  a reduction  from  a given  specification  to  a 
smaller  one  that  is  more  likely  to  be  tractable  for  a model  checker.  We  tailor  the 
reduction  for  test  generation,  as  opposed  to  the  usual  goal  of  analysis. 

A broad  span  of  research  from  early  work  on  algebraic  specifications  [13] 
to  more  recent  work  such  as  [21]  addresses  the  problem  of  relating  tests  to 
formal  specifications.  In  particular,  counterexamples  from  model  checkers  are 
potentially  useful  test  cases.  In  addition  to  our  use  of  the  Symbolic  Model  Ver- 
ifier (SMV)  model  checker  [19]  to  generate  mutation  adequate  tests  [2],  Calla- 
han, Schneider,  and  Easterbrook  use  the  Simple  PROMELA  Interpreter  (SPIN) 
model  checker  [16]  to  generate  tests  that  cover  each  block  in  a certain  parti- 
tioning of  the  input  domain  [8].  Gargantini  and  Heitmeyer  use  both  SPIN  and 
SMV  to  generate  branch-adequate  tests  from  Software  Cost  Reduction  (SCR) 
requirements  specifications  [14]. 

The  model  checking  approach  to  formal  methods  has  received  considerable 
attention  in  the  literature,  and  readily  available  tools  such  as  SMV  and  SPIN 
are  capable  of  handling  the  state  spaces  associated  with  realistic  problems  [11]. 
Although  model  checking  began  as  a method  for  verifying  hardware  designs, 
there  is  growing  evidence  that  model  checking  can  be  applied  with  considerable 
automation  to  specifications  for  relatively  large  software  systems,  such  as  the 
Traffic  Alert  &:  Collision  Avoidance  System  (TCAS)  II  [9].  The  increasing  use- 
fulness of  model  checkers  for  software  systems  makes  them  attractive  targets  for 
use  in  aspects  of  software  development  other  than  pure  analysis,  which  is  their 
primary  role  today. 

Model  checking  has  been  successfully  applied  to  a wide  variety  of  practi- 
cal problems,  including  hardware  design,  protocol  analysis,  operating  systems, 
reactive  system  analysis,  fault  tolerance,  and  security.  The  chief  advantage  of 
model  checking  over  the  competing  approach  of  theorem  proving  is  complete 
automation.  Whereas  human  interaction  is  generally  required  to  prove  all  but 
the  simplest  theorems,  model  checkers  can  explore  the  state  spaces  for  finite,  yet 
realistic,  problems  without  human  guidance. 

A model  checking  specification  consists  of  two  parts.  One  part  is  a state 
machine  defined  in  terms  of  variables,  initial  values  for  the  variables,  environ- 
mental assumptions,  and  a description  of  the  conditions  under  which  variables 
may  change  value.  The  other  part  is  temporal  logic  constraints  over  states  and 
execution  paths.  Conceptually,  a model  checker  visits  all  reachable  states  and 
verifies  that  the  temporal  logic  properties  are  satisfied  over  each  possible  path. 
Model  checkers  exploit  clever  ways  of  avoiding  brute  force  exploration  of  the 
state  space,  for  example,  see  [7].  If  a property  is  not  satisfied,  the  model  checker 
attempts  to  generate  a counterexample  in  the  form  of  a trace  or  sequence  of 
states.  For  some  temporal  logic  properties,  no  counterexample  is  possible.  For 
example,  if  the  property  states  that  at  least  one  possible  execution  path  leads  to 
a certain  state  and  in  fact  no  path  leads  to  that  state,  there  is  no  counterexample 
to  exhibit. 

Even  though  model  checking  are  powerful  formal  “compute  engines,”  clever 
abstractions  are  required  for  problems  of  even  modest  complexity  to  avoid  the 
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state  space  explosion  problem,  which  renders  the  model  checker  useless.  Some  of 
these  abstractions  are  informal,  although  there  have  been  significant  formaliza- 
tions of  the  abstraction  process  [4,9, 18].  Other  abstractions  [20]  axe  formalized 
to  the  extent  of  being  paired  with  theorem  provers  and  model  checkers  to  cal- 
culate and  refine  them.  These  abstractions  are  directed  at  the  analysis  problem, 
that  is,  determining  whether  given  properties  expressed  in  a temporal  logic  hold 
over  a given  state  machine. 

In  this  paper,  we  focus  on  test  generation  instead  of  analysis,  and  there- 
fore test  requirements  drive  our  abstraction.  The  basic  property  needed  for  test 
generation,  which  we  term  reduction  soundness , is  that  if  a counterexample  is 
generated  in  the  abstraction,  the  counterexample  is  also  a test  case  in  the  origi- 
nal specification.  Reduction  soundness  does  not  necessarily  hold  for  abstractions 
designed  for  analysis,  although  other  soundness  properties  may  hold. 

Our  contributions  in  this  paper  are: 

1.  We  provide  a new  reduction  called  finite  focus  for  abstracting  state  machine 
specifications. 

2.  We  develop  a notion  of  soundness  that  is  suitable  for  test  generation,  and 
we  show  that  the  finite  focus  reduction  is  sound. 

3.  We  define  a notion  of  mutation  adequacy  for  mutation  testing  of  model 
checking  specifications,  and  describe  the  subset  of  mutants  for  which  finite 
focus  generates  a mutation  adequate  test  set.  Informally,  they  are  mutants 
that  can  be  distinguished  via  traces  from  the  finite  focus  reduction. 

Section  2 is  an  overview  of  the  mutation  analysis  approach  for  generating  tests 
and  measuring  coverage  with  a model  checker  described  in  [1,2],  and  Sect.  3 ex- 
plains how  the  finite  focus  reduction  fits  into  this  approach.  Section  4 formally 
defines  the  reduction  and  proves  soundness  properties.  Although  significant  parts 
of  the  reduction  are  theoretically  underconstrained,  Sect.  5 describes  additional 
considerations,  particularly  for  mutation  adequacy.  Section  6 presents  an  exam- 
ple of  using  finite  focus  to  generate  tests  for  validation.  Section  7 gives  our  plans 
for  future  work.  Finally,  we  present  our  conclusions  in  Sect.  8.  Appendix  A is 
a full  proof  of  the  rewriting  rules  used  in  the  reduction.  Appendix  B gives  the 
model  used  in  the  example  and  an  alternate  model,  and  App.  C gives  the  tests. 

2 Automatic  Test  Generation 

Figure  1 shows  the  overall  approach  explained  in  detail  in  [1,2].  One  begins 
with  some  system  specifications  and,  through  finite  modeling  and  with  the  aid 
of  automated  tools,  turns  them  into  specifications  suitable  for  a model  checker. 
After  this  point  all  processing  can  be  automatic. 

2.1  Background  on  Mutation  Analysis 

Standard  mutation  analysis  [12]  is  a method  based  on  program  source  code  to 
develop  a set  of  test  cases  which  is  sensitive  to  small  syntactic  changes  to  a 
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Fig.  1.  Automatic  Test  Generation  in  [1] 


program.  The  rationale  is  that  if  a test  set  can  distinguish  a program  from  each 
of  its  slight  variations,  the  test  set  is  exercising  the  program  adequately. 

A mutation  analysis  system  defines  a set  of  mutation  operators.  Each  opera- 
tor is  a pattern  for  a small  syntactic  change.  A mutant  program , or  more  simply, 
mutant,  is  produced  by  applying  a single  mutation  operator  exactly  once  to  the 
original  program.  Applying  the  set  of  operators  systematically  generates  a set 
of  mutant  programs.  Some  of  these  mutants  may  be  semantically  equivalent  to 
the  original  program.  That  is,  a mutant  and  the  original  may  compute  the  same 
function  for  all  possible  inputs.  Such  mutants  are  termed  equivalent.  Equivalent 
mutants  present  a serious  problem  for  program-based  mutation  analysis,  since 
identifying  equivalent  mutants  is,  in  general,  an  undecidable  problem. 

2.2  Test  Generation  Via  Mutation  Operators 

The  specification-based  mutation  analysis  scheme  in  [1,2]  is  decidable  since  its 
domain  is  the  finite  state  space  of  a model  checker  specification.  To  provide  a 
richer  set  of  constraints,  the  state  machine  specification  is  “reflected”  in  the  con- 
straints. For  instance,  a transition  from  state  si  to  s2  on  condition  c becomes 
the  constraint  SPEC  AG  si  & c ->  AX  s2.  Mutation  operators  are  applied  to 
the  constraints  yielding  a set  of  mutant  specifications.  A “condition  substitute” 
operator  yields  SPEC  AG  si  & b ->  AX  s2,  among  other  mutant  specifications, 
when  applied  to  the  above  constraint.  Other  operators  change  constants,  vari- 
ables, or  boolean  operators,  drop  conditions,  etc. 

The  model  checker  compares  the  (assumed-good)  state  machine  with  the 
mutants.  When  it  finds  an  inconsistency,  it  generates  a counterexample.  Equiv- 
alent mutants  produce  no  counterexamples  and  therefore  are  automatically  dis- 
regarded. 

The  set  of  counterexamples  is  reduced  by  eliminating  duplicates  and  coun- 
terexamples which  are  “prefixes”  of  other,  longer  counterexamples.  The  coun- 
terexamples contain  both  stimuli  and  expected  output  values  so  they  may  be 
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automatically  converted  to  complete  test  cases.  The  test  cases  generate  exe- 
cutable test  code,  including  a test  harness  and  drivers.  For  a given  set  of  muta- 
tion operators,  the  procedures  in  [1,2]  generate  a mutation-adequate  set  of  test 
cases. 

3 Practical  Test  Generation 

The  preceding  approach  uses  model  checkers  to  process  specifications.  Unfortu- 
nately, symbolic  model  checkers  can  only  handle  finite  state  machines.  In  fact, 
spaces  with  more  than  a few  thousand  states  must  often  be  handled  in  special 
ways.  Yet  specifications  of  realistic  software  often  have  enormous,  even  infinite, 
state  spaces. 

It  is  often  straight-forward  for  an  analyst  to  come  up  with  a smaller  model 
if  the  original  model  is  too  large  for  the  model  checker.  However,  it  is  generally 
impractical  to  require  large  amounts  of  human  time  to  devise  smaller  models. 
To  leverage  scarce  human  expertise,  we  want  reductions  and  abstractions  which 
are  highly  automated. 


3.1  Other  Reduction  Approaches 

We  know  of  several  existing,  mechanical  approaches  to  reduction  or  abstraction. 
To  be  useful,  abstractions  must  preserve  some  properties  of  the  original.  Two 
useful  measures  are  soundness  and  completeness. 

In  a sound  abstraction,  properties  of  the  reduced  or  abstract  specification 
are  also  properties  of  the  original  specification.  Soundness  avoids  false  positives. 
That  is,  any  error  found  in  the  abstract  specification  (a  “positive”  result)  is  also 
an  error  in  the  original  specification.  In  a complete  abstraction  properties  of  the 
original  specification  are  also  properties  of  the  reduced  specification.  Complete- 
ness avoids  false  negatives.  That  is,  all  errors  in  the  original  specification  will  be 
found  in  the  abstract  specification. 

Heitmeyer,  et.  al.  [15]  formalize  an  abstraction  which  removes  irrelevant  vari- 
ables. Briefly,  to  check  that  some  property  q holds  for  a specification,  one  may 
remove  variables  and  inputs  which  do  not  occur  in  or  contribute  to  q.  Another 
abstraction  removes  monitored  or  input  variables  which  only  contribute  directly 
to  one  other  variable.  Finally,  they  also  describe  a method  which  abstracts  mon- 
itored variables.  That  is,  if  only  certain  values  or  ranges  of  a monitored  vari- 
able influence  the  values  of  other  variables,  the  monitored  variable  may  be  re- 
placed with  an  abstract  variable.  For  instance,  consider  an  input  variable,  Water 
Pressure,  with  a discrete  range  from  0 to  2000  whose  influence  is  constant  over 
low  values,  over  a range  of  moderate  values,  and  over  high  values.  This  vari- 
able may  be  replaced  with  a quantized  variable  which  has  the  values  Too_Low, 
InJRange,  or  Too_High.  All  these  abstractions  are  sound  for  analysis.  The  first 
is  complete,  and  the  other  two  are  complete  under  conditions  which  frequently 
hold  in  practise. 
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Chan,  et.  al.  [9]  use  another  method  to  reduce  the  state  size.  Some  specifica- 
tions place  time  bounds  on  the  intervals  between  events.  The  obvious  specifica- 
tion keeps  time  as  an  integer,  uses  variables  to  record  the  times  of  events,  and 
has  predicates  on  the  difference  in  times.  Instead,  they  keep  (bounded)  timers 
measuring  the  time  since  events.  When  the  bounds  are  exceeded,  the  timers 
enter  a “satisfied”  (or  “unsatisfied”)  state. 

They  also  use  a temporal  strength  reduction.  Suppose  there  is  a predicate 
on  the  value  from  a previous  state.  Rather  than  saving  the  previous  value  and 
computing  the  predicate,  just  save  the  value  of  the  predicate.  For  instance,  rather 
than  save  the  previous  value  of  an  integer  y , and  then  compute  prev(y ) > 1000, 
compute  prev(y  > 1000).  The  abstracted  model  only  need  save  a boolean  value. 

Kurshan  [18]  explains  how  k verifications  may  be  done  on  k reductions  of  a 
system,  each  of  which  is  a j part  of  the  entire  system.  Since  verification  is  often 
exponential  in  the  size  of  the  system,  a verification  of  the  entire  system  may  be 
proportional  to  cn  while  k verifications  take  kc%  work. 

In  an  overview  presentation,  Rushby  [20]  advocates  “ubiquitous  abstraction,” 
that  is,  using  abstractions  in  several  different  ways  in  all  parts  of  analysis.  For 
instance,  even  for  one  given  problem,  different  abstractions  may  be  appropriate, 
depending  on  the  invariants  used  to  prove  a goal.  The  invariants  may  be  au- 
tomatically strengthened  when  the  proof  fails.  Another  noteworthy  approach  is 
calculating  transitions  of  a state  abstraction  using  rules  that  guarantee  correct- 
ness, as  opposed  to  taking  a hand-crafted  abstraction  and  proving  it  is  sound 
and  complete.  Abstractions  may  be  refined  automatically  using  information  from 
static  analysis,  such  as  reachable  states.  In  contrast,  we  are  still  at  the  stage  of 
characterizing  abstractions  in  our  work,  albeit  for  a different  notion  of  soundness, 
rather  than  computing  them. 

Bensalem,  Lakhnech,  and  Owre  [3]  explain  a semi-automated  abstraction  in 
which  the  analyst  chooses  a state  abstraction  and  then  a conservative  (sound)  set 
of  corresponding  transitions  are  computed.  Construction  begins  with  a complete 
set  of  transitions,  that  is,  a transition  from  every  abstract  state  to  every  other 
abstract  state.  If  a transition  can  be  (automatically)  proven  to  be  impossible,  it 
is  removed.  Since  such  proofs  are  in  general  too  complex,  they  combine  it  with 
three  techniques  based  on  partitioning  the  abstract  variables,  substituting,  and 
using  the  property  being  investigated. 


3.2  A New  Reduction 

Since  our  goal  is  automatic  test  generation,  rather  than  property  analysis,  we 
can  use  different  reductions.  For  analysis,  reductions  may  summarize  states  and 
discard  details  of  transitions.  A reduced  model  may  still  be  quite  useful  even  if 
it  is  not  precise.  To  automatically  generate  tests,  we  may  wish  to  retain  details 
in  order  to  easily  determine  if  an  implementation  behaves  properly.  We  can  then 
accumulate  sets  of  tests  generated  from  different  precise  reductions.  In  sum- 
mary, an  abstraction  which  is  perfectly  satisfactory  for  one  purpose,  property, 
or  specification  may  be  unusable  in  another. 
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Fig.  2.  Focus  on  a Finite  Subset  of  States 


A typical  abstraction  is  to  map  variables  with  large  or  unbounded  domains 
to  a fixed  subset  of  the  possible  values.  For  example,  an  integer  variable  x might 
be  modeled  with  a corresponding  variable  xmodei,  having  a bounded  range  of 
0, 1,  and  2.  From  the  test  generation  perspective,  the  ranges  simply  need  to  cover 
values  which  may  be  interesting  when  used  in  actual  test  cases. 

Consider  the  example  of  a bank  balance  in  an  imaginary  currency,  the  Florin 
(i7),  with  operations  to  deposit  and  withdraw  one  Florin.  The  complete  model, 
depicted  by  the  top  row  and  labeled  S in  Fig.  2,  uses  type  natural.  However, 
the  model  cannot  be  automatically  examined  by  a model  checker.  To  use  the 
analytical  resources  of  a model  checker,  we  must  drastically  reduce  this  model 
to  some  finite  size.  For  instance,  a human  analyst  may  naturally  focus  on  what 
happens  when  the  balance  is  close  to  zero  and  ignore,  for  the  moment,  large 
balances.  Can  we  formalize  this  focusing  on  a subrange  so  that  the  analyst  need 
not  worry  about  making  an  unsound  reduction? 

Suppose  we  choose  to  accurately  model  balances  of  J^O,  JF1,  and  £ 2 , and 
map  anything  greater  than  two  to  “other.”  We  need  to  indicate  that  the  model 
checker  should  ignore  any  set  of  operations  in  which  the  balance  exceeds  JF 2, 
since  they  may  not  be  sound,  i.e.,  may  not  be  accurately  represented. 

Consider  having  one  constraint  on  accounts  with  a balance  of  ^"3  and  a differ- 
ent constraint  on  those  with  a balance  of  JF 4.  Both  of  these  balances  are  mapped 
to  “other”  in  the  reduction.  This  loss  of  accuracy  indicates  that  any  execution 
path  entering  “other”  is  suspect.  We  record  this  by  adding  a “soundness”  state 
variable  which  becomes  unsound  if  the  state  becomes  “other,”  such  as  a deposit 
when  the  balance  is  Jr 2 . The  bottom  row,  labeled  Sr  in  Fig.  2,  illustrates  this 
reduction.  We  can  then  have  the  model  checker  ignore  any  unsound  inconsis- 
tencies so  that  it  returns  only  those  which  are  problems  in  the  full  model.  We 
formalize  the  idea  behind  this  example  as  a reduction  we  call  “finite  focus.” 


3.3  Finite  Focus  and  Test  Generation 

For  our  purposes,  a system  specification  is  a pair  (S,T),  where  5 is  a state 
machine  description  and  T is  a set  of  temporal  logic  constraints.  S may  be 
unbounded,  for  instance,  part  of  the  state  may  be  an  integer. 
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To  generate  test  cases  using  the  method  described  in  Sect.  2,  we  must  be 
able  to  analyze  the  specifications  with  a symbolic  model  checker.  Figure  3 illus- 
trates the  steps  to  apply  the  reduction  for  finite  focus  or  RFF.  For  test  case 
generation,  the  state  machine,  S,  is  reflected  as  temporal  logic  constraints  to 
provide  a description  for  subsequent  mutation  analysis.  Any  existing  temporal 
logic  constraints,  r in  the  figure,  may  be  added  to  the  reflected  constraints  which 
describe  the  state  machine. 

Some  finite  number  of  states,  focused  around  the  initial  state,  are  mapped 
to  states  in  the  reduced  specification.  All  other  states  are  mapped  to  a single 
“other”  state.  The  source  and  destination  of  each  transition  are  mapped  likewise. 
The  function  RFFt  maps  temporal  logic  constraints,  and  RFFs  maps  the  state 
machine.  The  two  functions,  along  with  constraint  rewriting  ( CR ) for  soundness, 
explained  below,  constitute  RFF. 


System 

specs 

5 


T 


RFFt 


tr 


M CR 


Tr  ► 

mutant 

model 

counter-  ( 

specs 



checker 

examples 

Fig.  3.  Specification  Transformations 


RFFs  also  adds  a separate  state  machine  with  the  initial  state  sound.  When- 
ever the  reduced  state  machine  ends  in  the  “other”  state,  this  added  machine 
goes  unsound.  It  remains  unsound  thereafter.  This  step  yields  a reduced  state 
machine,  Sr. 

RFFt  yields  reduced  temporal  logic  constraints,  Tr,  but  is  less  rigidly  de- 
termined than  RFFs-  Together  Sr  and  Tr  answer  to  the  finite  specifications  of 
Fig.  1. 

To  generate  counterexamples,  we  repeatedly  apply  various  mutation  opera- 
tors, M in  Fig.  3,  to  the  temporal  logic  constraints.  Then,  in  order  to  prevent 
unsound  counterexamples,  we  rewrite  the  constraints  so  they  are  always  satis- 
fied when  the  state  is  unsound.  This  constraint  rewriting,  CR , yields  mutated 
constraints,  T'R.  Together  Sr  and  T'R  are  given  to  the  model  checker  which  effi- 
ciently computes  a number  of  counterexamples.  We  prove  the  soundness  of  our 
reduction  in  Sect.  4 as  the  central  result  of  this  paper.  Soundness  for  test  gen- 
eration means  that  any  counterexample  of  the  reduced  specification  (Sr,  Tr)  is 
a valid  trace  of  the  original  state  machine  specification,  5. 


3.4  Preventing  Unsound  Counterexamples 

Along  with  the  reductions  RFFt  and  RFFs,  we  must  make  sure  no  counterex- 
amples are  produced  if  the  state  becomes  unsound.  We  can  prevent  counterex- 
amples by  modifying  the  temporal  logic  specifications  so  there  is  no  violation 
if  the  model  is  unsound.  Here  we  give  a set  of  rewriting  rules  which  force  any 
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temporal  logic  expression  to  evaluate  to  true  (or  false,  as  appropriate)  if  the 
state  is  unsound. 

Let  s be  the  variable  representing  soundness:  if  the  state  is  sound,  s is  true. 
If  the  machine  takes  an  unsound  transition,  s becomes  false.  Note  that  once  s 
becomes  false,  it  remains  false;  RFF  soundness  relies  on  this. 

Specifications  in  computation  tree  logic  (CTL)  [10]  are  changed  according  to 
the  following  rules,  which  we  refer  to  as  the  constraint  rewriting  rules,  CR.  If 
a CTL  formula  does  not  begin  with  a temporal  operator,  it  is  rewritten  as  an 
implication  so  that  it  has  the  value  True  when  the  soundness  variable  is  false. 
Otherwise,  the  temporal  operator  rewriting  rule  is  applied. 

_ J cr(/,  True)  if  / begins  with  a temporal  operator 
' \s  —>  cr(f,True)  otherwise 


Formulae  must  be  rewritten  recursively  so  that  embedded  temporal  operators, 
which  refer  to  future  states,  are  rewritten  to  be  True  when  the  soundness  vari- 
able becomes  false  in  those  future  states.  Constraint  rewriting  with  a value, 
cr(/,  v),  tracks  whether  the  formula  has  been  negated.  If  the  formula  is  a logical 
negation,  implication,  or  equivalence,  the  value  is  negated  in  rewriting  some  of 
the  subexpressions.  Otherwise  the  subexpressions  are  rewritten  with  the  value 
unchanged. 


cr(!  f,v)  =!cr(/,~t/) 
cr(f&g,v ) =cr(f,v)kcr(g,v) 
cr(f  | g,  v)  = cr(f,  v ) | cr(g,  v) 
cr(f  — > g,  v)  = cr(f,~v)  ->  cr{g,v) 

cr(f  <->  g,v)  = cr{f,~v)->cr{g,  v)  & cr(g,~v)->cr(f,v) 


If  the  formula  is  a temporal  operator,  it  is  rewritten  so  the  expression  becomes 
True  (or  False)  when  the  soundness  variable  is  false.  The  operators  AG,  AF,  AX, 
EG,  EF,  and  EX  follow  these  patterns.  The  meta-variable  OP  represents  one  of 
the  six  operators. 

cr(0P  /,  True)  = OP  s — > cr(f,True ) 
cr(0P  /,  False)  = OP  skcr(f,  False) 

The  operators  A. . . U and  E. . . U follow  these  patterns. 

cr (OPgUf,  True)  = OPgUs  — > cr(f,  True) 
cr(0PgVf,  False)  — 0PpUs&cr(/,  False) 

If  the  formula  is  none  of  the  above,  say,  a variable,  it  is  unchanged. 


cr(f,v)  = f 


For  example,  the  following  specification  states  that  an  instruction  which 
pushes  one  item  on  the  stack  of  a Java1  virtual  machine,  increases  the  stack 
size  by  one  in  the  next  state. 

1 Java  is  a trademark  of  Sun  Microsystems,  Inc. 
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SPEC  AG(instr=pushl  ->  AX(StkSize=PStkSize+l) ) 

Rewriting  for  soundness  yields  the  formula  below  which  reads:  if  the  current 
state  is  sound  and  the  instruction  pushes  one  item,  the  stack  is  one  larger  in  the 
next  state  if  it  is  (still)  sound. 

SPEC  AG(s  ->  instr=pushl  ->  AX(s  ->  StkSize=PStkSize+l) ) 

Our  models  and  specifications  for  a Java  virtual  machine  stack  are  in  App.  B. 

We  now  want  to  argue  a theorem  that  states,  roughly,  when  s is  false,  every 
constraint  evaluates  to  true.  First  we  define  a finite  execution  path  of  a state  ma- 
chine, over  which  constraints  are  evaluated.  We  begin  with  a reference  definition 
of  a state  machine. 

Definition  1 (after  [4]).  A state  machine  S is  a 4-tuple  (E,so,Em,T),  where 
E is  a set  of  states,  so  € E is  the  initial  state,  Em  is  the  set  of  input  events, 
and  T describes  the  state  transitions.  The  transitions  T are  a relation  s x e — > s 
where  s £ E,  and  e 6 Em . 

Since  T is  a relation,  this  definition  includes  non-deterministic  state  machines. 
Input  events  correspond  to  monitored  variables  in  [4,5]. 

Definition  2 (after  [10]).  A finite  path  is  a finite  sequence  of  states  (so,si, 

. . . , sn ) such  that  Vi  | 0 < i < n =>  (si,  Sj+i)  £ T . 

Definition  3.  A finite  path  is  irreducible  if  after  removing  the  last  state,  the 
path  does  not  violate  any  constraint. 

That  is,  a finite  path  (so,  si, . . . ,sn)  is  irreducible  if  (s0, 5i, . . . , sn-i)  does 
not  violate  any  constraint. 

Theorem  1.  Suppose  that  CR  is  applied  to  a set  of  constraints,  Tr.  Any  irre- 
ducible path  that  violates  a constraint  in  CR(Tr)  has  s equal  true  in  each  state. 

We  give  a proof  sketch  here;  see  App.  A for  a more  formal  proof.  We  first 
argue  the  rules  for  universally  quantified  expressions.  The  AG  rule  describes 
invariants  on  states;  clearly  this  rule  exempts  a particular  state  if  s is  false  in 
that  state.  The  AF  rule  describes  a property  of  some  future  state;  if  s is  false,  it 
remains  false  in  all  future  states,  and  therefore  the  property  is  true  for  all  future 
states.  The  AX  rule  is  a special  case  of  the  AF  rule  where  the  future  state  is 
simply  the  next  state. 

Finally,  the  meaning  of  AglJf  requires  / to  become  true  eventually  and  for 
g to  hold  until  it  does.  If  s is  false,  the  second  rewritten  predicate  s —>  f holds, 
thus  satisfying  that  the  second  predicate  becomes  true  eventually.  Also  when 
s is  false,  the  second  predicate  is  true  immediately  which  satisfies  the  “until” 
condition,  too. 

In  case  we  need  to  rewrite  the  expression  to  be  false,  for  instance,  if  the 
specification  is  SPEC  ! AG  p,  the  rewriting  rules  for  AG,  AF,  and  A . . . U force 
the  value  to  be  false  when  s is  false.  For  AX  we  need  the  requirement  on  CTL 
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structures  that  every  state  has  at  least  one  outgoing  transition.  Otherwise,  if  a 
state  had  no  next  state,  AX  False  would  be  vacuously  true. 

The  rules  for  rewriting  existential  quantifiers  to  be  true  or  false  follow  simi- 
larly, except  for  rewriting  EX  expressions  to  be  true.  If  a state  had  no  next  state, 
EX  True  would  be  false,  since  there  is  no  next  state  at  all.  Since  every  state  has 
a next  state,  the  rewriting  works  in  all  cases. 

4 Proof  of  Reduction  Soundness 

In  this  section  we  formally  define  the  properties  which  any  reduction  for  finite 
focus  must  have.  We  define  soundness  for  our  ultimate  purpose  of  test  generation 
and  prove  that  counterexamples  from  finite  focus  reductions  are  sound. 

Definition  4.  A trace  of  an  execution  is  a list  of  inputs  to  a state  machine 
and  the  resultant  states.  Formally  t = [(ei,  Si), . . . , (e„,sn)]  is  a trace  of  S if 
\/i\l<i<n=>T  : (si_i,ej)  — > Sj. 

Note  that  this  definition  allows  non-deterministic  state  machines.  However,  any 
particular  trace  is  completely  unambiguous,  even  if  the  state  machine  is  non- 
deterministic.  That  is,  the  particular  sequence  of  transitions  yielding  a trace  may 
be  unambiguously  reconstructed  from  the  trace:  each  transition  is  (sj_i,ej)  — > Sj 
where  1 < i < n. 

Definition  5.  A counterexample,  c,  from  a state  machine,  S,  and  temporal  logic 
constraints,  T,  is  an  irreducible  trace  of  S with  a constraint  violation  of  T . 

In  other  words,  model  checkers  produce  counterexamples  just  long  enough 
to  demonstrate  some  combination  of  inputs  and  states  which  cause  one  or  more 
constraints  to  be  false.  If  the  model  checker  produced  counterexamples  longer 
than  necessary,  it  may  find  a contradiction  in  a sound  state,  then  arbitrarily 
continue  the  trace  and  eventually  report  an  unsound  state. 

Some  model  checkers  can  generate  counterexamples  quite  efficiently.  The  ac- 
tual counterexamples  often  elide  much  of  the  redundant  trace  information,  such 
as  values  which  stay  the  same  in  a new  state.  When  we  say  that  a counterex- 
ample is  produced  from  some  S and  T,  we  mean  a trace  which  demonstrates  a 
constraint  violation  is  found  and  reported. 

Definition  6.  A state  machine  reduction  for  finite  focus,  RFFs,  has  the  fol- 
lowing properties: 

1.  The  reduction  accurately  copies  the  initial  state.  It  may  also  copy  some  finite 
number  of  states  around  the  initial  state. 

RFFs 

(a)  The  initial  state  is  mapped  one-to-one:  s o — > so- 

(b)  Other  states  are  mapped  one-to-one,  also. 

(c)  Any  remaining  states  are  mapped  to  a new  state,  “other”:  Si  Er  => 


11 


2.  Input  events  map  identically:  Em 

3.  The  sources  and  destinations  of  transitions  are  mapped  according  to  the 

above,  T : (s,e)  ->  s'  R-^s  Tr  : ( RFFs(s),e ) RFFs(s') 

f.  The  reduction  adds  a new  “soundness”  variable  with  two  states:  sound  and 
unsound. 

(a)  The  initial  state  is  sound. 

(b)  For  every  unsound  transition  (Tr  : (s,e)  other),  add  a transition 
conditioned  on  the  source  state  and  event  from  sound  to  unsound,  more 
formally,  (sound,  s x e)  -*  unsound. 

(c)  Any  other  transition  from  sound  remains  in  sound. 

(d)  All  transitions  from  unsound  remain  in  unsound. 

RF  Ft 

The  temporal  logic  reduction  T — > Tr  is  nearly  unconstrained.  Theoreti- 
cally, soundness  is  preserved  by  any  reduction,  as  long  as  the  resulting  constraints 
are  valid  (e.g.,  no  undefined  variables  or  constants)  in  the  state  machine,  Sr.  In 
Sect.  5 we  describe  practical  requirements  to  achieve  coverage. 

Definition  7.  A state  s in  Sr  is  sound  if  it  is  faithfully  copied  to  Sr. 

RF  Fs 

That  is,  if  s G F — >■  s 6 Ur,  s is  sound. 

Lemma  1.  Counterexamples  include  no  unsound  states. 

Proof:  Assume  there  is  a first  unsound  state  in  the  counterexample  trace. 

1.  There  was  no  contradiction  in  a previous  state,  by  Definition  5. 

2.  Since  this  state  is  unsound,  it  is  not  part  of  a contradiction,  by  Theorem  1. 

3.  Since  subsequent  states  are  also  unsound  (part  4d  of  the  definition  of  RFF), 
they  cannot  be  part  of  a contradiction,  either. 

So  there  is  no  contradiction  at  all.  But  this  conflicts  with  the  definition  that 
counterexamples  have  contradictions.  Therefore  there  is  no  first  unsound  state. 

Complementing  Lemma  1,  we  argue  that  all  sound  transitions  from  soundly 
reachable  states  may  be  used.  Consider  all  sound  states  which  are  reachable 
from  the  initial  state  through  other  sound  states.  Any  transition  from  a soundly 
reachable  sound  state  to  another  sound  state  may  appear  in  counterexamples. 

To  say  which  transitions  actually  appear  in  counterexamples  would  require 
us  to  characterize  the  state  machine  duplication,  possible  mutation  operators, 
the  temporal  logic  reduction  function,  and  the  model  checker’s  counterexample 
selection  scheme.  We  discuss  a related  issue,  mutation  adequacy,  in  Sect.  5. 

Theorem  2.  No  soundly  reachable  sound  transition  is  necessarily  excluded  from 
counterexamples. 

Proof: 

1.  By  definition  the  initial  state  is  reachable.  Since  it  is  also  sound,  it  is  soundly 
reachable.  Any  sound  state  reachable  from  a soundly  reachable  state  is 
soundly  reachable,  too. 
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2.  Since  the  machine  always  remains  sound,  any  sound  transition  from  a soundly 

reachable  state  is  not  precluded  by  our  scheme. 

We  now  present  our  main  result,  namely  that  any  reduction  following  the 
above  is  sound,  or  that  it  produces  only  sound  counterexamples.  In  other  words, 
the  counterexample  trace  can  be  mapped  back  to  a (valid)  trace  in  the  original 
specification  with  a simple  inverse  mapping  RFFg1 . 

The  inverse  mapping  from  sound  states  sr  in  Sr  back  to  states  s in  S is 
RFFg 1 = {(s/j,s)  | sr  other  A (s,s/?)  G RFFs}.  Note  that  there  is  no 
mapping  for  the  unsound  state,  “other.”  Since  RFF  is  otherwise  one-to-one, 
RFFg 1 is  a (partial)  function.  Events  in  Sr  are  the  same  as  those  in  S,  so  they 
map  identically.  The  inverse  mapping  of  a trace  is  the  inverse  mapping  of  each 
state  in  the  trace. 

Theorem  3.  Any  counterexample,  c,  produced  from  Sr  = RFFs(S)  and 
T'r  — CR(M(Tr ))  is  sound. 

Proof:  By  the  definition  of  counterexample  soundness,  we  must  prove 
RFFg1^)  is  a trace  of  S.  If  a state  appears  in  c,  it  is  sound,  by  Lemma  1. 
Since  all  states  are  sound,  RFFg 1 maps  them  back  to  5,  and  the  Sr  transitions 
implied  by  c are  also  transitions  in  S. 

5 Mutation  Analysis  and  Utility 

Definition  6 does  not  require  RFF  to  map  any  additional  states  beyond  the 
initial  state.  However,  the  more  states  which  are  mapped  one-to-one,  the  more 
traces  there  are  in  the  reduced  model.  To  be  useful,  the  additional  states  must 
also  be  soundly  reachable.  (If  any  state  is  reachable  only  through  an  unsound 
state,  no  counterexample  includes  it.)  Thus  an  analyst  is  free  to  reduce  a speci- 
fication to  as  few  states  as  necessary  for  effective  model  checking,  but  may  wish 
to  make  the  reduced  specification  as  large  as  possible. 

Theorem  3 describes  conditions  under  which  counterexamples  generated  by 
a model  checker  from  Sr  are  traces  of  S.  Since  RF Ft  and  M precede  CR , they 
are  unconstrained  by  soundness;  indeed,  any  transforms  may  be  used  without 
invalidating  Theorem  3.  (Intuitively,  as  long  as  the  temporal  logic  constraints  are 
rewritten  with  CR , they  can  only  induce  more  or  fewer  sound  counterexamples.) 

However,  we  wish  to  use  the  counterexamples  for  test  cases  for  an  implemen- 
tation of  S as  outlined  in  Sect.  2.  Clearly,  the  utility  of  test  cases  produced  by 
this  method  depends  on  the  transforms  RFFt  and  the  set  of  mutation  opera- 
tors M.  from  which  M is  drawn.  In  the  remainder  of  this  section,  we  describe 
constraints  on  RFFt  which  yield  coverage  with  respect  to  M. 

We  first  describe  the  ideal  situation,  which  is  different  from  the  order  depicted 
in  Fig.  3,  and  then  introduce  practical  considerations.  Consider  applying  some 
mutant  operator  M G M to  T,  which  produces  T' . Then  T'  is  transformed  via 
RFFt  and  CR  to  produce  T'R.  The  model  checker  is  run  on  (Sr,  Tr ),  (possibly) 
producing  a counterexample  tR.  The  existences  of  counterexample  traces  may 
be  characterized  as  follows: 


13 


If  there  is  a trace  t in  5 such  that 

1.  t is  a counterexample2  with  respect  to  T' , and 

2.  RFFs(t ) is  a sound  trace  in  Sr , 

then  some  tR  does  in  fact  exist  for  the  model  checker  to  find,  and  the  trace 
RFF^^r)  is  a counterexample  with  respect  to  T' . 

Informally,  this  says  that  the  set  of  counterexamples  generated  by  the  model 
checker  from  Sr  is  as  close  to  being  mutation  adequate  with  respect  to  S,  T, 
and  M as  possible.  In  other  words,  if  a mutant  T'  can  be  distinguished  by  a 
sound  trace  from  Sr,  the  model  checker  finds  such  a sound  trace. 

From  a practical  perspective,  the  chief  drawback  of  the  above  characteri- 
zation is  that  the  reduction  RFFr  is  applied  after  each  mutation  operator  is 
applied.  If  RFFr  can  be  completely  automated  for  some  application,  this  is  not 
a serious  problem.  If,  however,  some  human  intervention  is  required  to  apply 
RFFr,  multiple  transformations  are  an  expensive  proposition.  Instead,  it  may 
be  more  practical  to  transform  T with  RFFt  once,  then  apply  mutation  opera- 
tors to  Tr  instead  of  T.  The  result  of  test  generation  is  then  a mutant  adequate 
set  of  tests  with  respect  to  Sr,  Tr,  and  M,  rather  than  S,  T,  and  M. 

6 Example 

To  validate  the  above  proof,  we  apply  finite  focus  to  an  example:  the  stack 
of  a Java  virtual  machine.  Abstracting  the  stack  to  just  the  number  of  items 
on  the  stack  (stack  size)  and  grouping  instructions  into  those  which  push  one 
item  (pushl),  pop  one  item  (popl),  or  pop  two  items  (pop2)  still  leaves  an 
unbounded  model  much  like  the  bank  balance  at  the  top  of  Fig.  2.  It  also  includes 
specifications  such  as  “pushl  then  popl  leaves  the  stack  unchanged.” 

We  applied  finite  focus  to  get  a usable  model  of  a stack  with  zero,  one, 
two,  three,  or  “many”  items  plus  a variable  which  goes  unsound  if  the  stack 
size  exceeds  three.  We  used  two  mutation  operators.  Ml  changes  constants  in 
phrases  such  as  VAR  = CONST,  and  M2  negates  boolean  expressions.  Ml  gave 
279  mutants,  and  M2  gave  129  mutants,  for  a total  of  408.  These  produced  254 
counterexamples.  Combining  duplicates  and  discarding  prefixes  yielded  9 unique 
tests  with  lengths  from  three  to  seven  operations.  By  comparison,  exhaustive 
enumeration  yields  45  tests  of  length  seven. 

The  stack  specifications  are  in  App.  B,  and  the  nine  tests  are  in  App.  C. 

7 Future  Work 

Since  the  goal  is  test  generation,  other  soundness  definitions  and  thus  useful 
counterexamples  are  possible.  In  the  bank  account  example  in  Fig.  2,  we  could 

2 Note  that,  by  assumption,  we  have  no  efficient  means  of  computing  such  a counterex- 
ample directly  from  5 and  T' . If  there  were  an  efficient  means,  we  could  simply  run 
the  model  checker  on  S instead  of  Sr  and  therefore  avoid  the  finite  focus  transform. 
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map  counterexamples  with  the  “other”  state  to  a loose  test  that  the  balance  is 
more  than  two.  The  inverse  mapping,  RFFg1,  would  be  more  complex,  but  it 
may  allow  more  information  to  be  derived  from  reductions. 

We  plan  to  investigate  different  sets  of  mutation  operators.  These  experi- 
ments, along  with  theoretical  considerations  of  predicate  test  domination  [17], 
should  help  us  develop  good  classes  of  operators. 

8 Conclusions 

Previously,  we  showed  how  to  use  a model  checker  to  measure  test  set  coverage 
[1],  and  to  develop  mutation  adequate  tests  for  software  specifications  [2],  Mu- 
tation analysis  in  the  finite  domain  of  the  model  checker  avoids  many  problems 
which  plague  program  mutation  analysis.  But  applying  it  depends  critically  on 
the  feasibility  of  model  checking  specifications  for  realistic  software  system. 

Here  we  address  model  checking  feasibility  for  test  generation,  and  present 
a reduction  called  finite  focus  for  it.  We  define  soundness  for  test  generation: 
counterexamples  generated  from  the  reduced  specification  are  test  cases  for  the 
original  specification.  We  prove  that  finite  focus  is  sound,  and  experimentally 
show  that  it  soundly  reduces  an  unbounded  model  to  a model  which  yields  a 
small,  yet  mutation  adequate,  test  set. 

Soundness  only  constrains  the  part  of  the  finite  focus  reduction  that  trans- 
forms the  state  machine  and  rewrites  the  temporal  logic  constraints.  To  maxi- 
mize utility,  we  develop  constraints  on  the  transform  of  the  temporal  logic  aspect 
which  improve  mutation  adequacy  in  the  original  specification. 
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A Proof  that  Rewrites  Prevent  Checking 


In  this  appendix  we  prove  that  when  rewritten  with  the  constraint  rewriting 
rules,  CR,  any  expression  evaluates  to  True  or  False  as  appropriate  in  any 
state  in  which  s is  False.  We  use  definitions  of  temporal  logic  operators  from 
Clarke,  et.  al.  [10]. 

Proof  that  cr(AX  f,True)  — AX  s —>  cr(f,True)  yields  True  when  s is  False. 

u 1=  AX  False  — > cr(/,  True ) — u |=  AX  True 

= Vu  | (u,v)  eT  — > v f=  True 
= Vu  | (u,  v)  € T — > True 
= Vu  | True 
= True 

Proof  that  cr(AX  /,  False)  = AX  s&cr(f,  False)  yields  False. 

u 1=  AX  False  & cr(f,  False)  = u \=  AX  False 

= Vu  | (u,v)  6 T — » v j=  False 
= Vu  | (u,  v)  6 T — > False 
= Vu  j (u,u)  ^ T 

and  since  every  state  has  at  least  one  outgoing  transition 

= False 


Proof  that  cr(EX  f,True)  = EX  s — > cr(f,True)  yields  True. 

u [=  EX  False  —>  cr(f,True)  = u [=  EX  True 

= 3v  | (u,  v)  € T A v ]=  T rue 
= 3u  | {u,v)  € T A True 
= 3u  | (u,v)  € T 

and  since  every  state  has  at  least  one  outgoing  transition 

= True 


Proof  that  er(EX  /,  False)  = EX  skcr(f,  False)  yields  False. 

u |=  EX  False&icr(f ',  False)  = u |=  EX  False 

= | (u,  v)  € T A v (=  False 

— 3v  | (u,u)  £ T A False 

- 3v  | False 
— False 
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Proof  that  cr(Ag\]f,True)  — Ag\Js  ->  cr(f,True ) yields  True.  Note  that  we  are 
quantifying  over  paths;  S{  means  “the  ith  state  of  the  path.” 

u |=  Agl J False  — > cr(f,True ) = u |=  AgUTrue 
= Vpath(s0  = u,  Si , . . . , sn)  [ 

3*  [i  > 0 A Si  f=  True  A Vj  [0  < j < i — > Sj  |=  5]]] 

= Vpath(s0  = u,si,...,sn)  [ 

3 i [i  > 0 A Vj  [0  < j < i -»■  sj  f=  g}]] 
choosing  i — 0 

= Vpath(s0  = u,  si , . . . , sn)  [ 

0 > 0 A V7  [0  < j < 0 — » Sj  |=  #]] 

= Vpath(so  = u,  si, . . . , sn)  [Vj  [False  -»  Sj  |=  5]] 

= Vpath(so  = u,Si, . . . , sn)  [T rue] 

- True 

Proof  that  cr(Ag\Jf,  False)  = AgUs$zcr(f,  False)  yields  False. 

u AgU  Falsefocr(f , False)  = u |=  AgVFalse 

= Vpath(s0  = u,si,...,sn)  [ 

3 i [f  > 0 A Sj  |=  False  A Vj  [0  < j < i -»  Sj  (=  </]]] 

= Wpath(s0  = u,si, . . . ,sn)  [ 

3 i [i  > 0 A False  A Vj  [0  < j < i — > Sj  |=  <7]]] 

= Vpath(s0  = u,  Si , . . . , sn)  [3i  [False]] 

= Vpath(so  — u,si, . . . ,sn)  [False] 

= False 

The  proofs  of  cr(Eg\Jf,True)  = E<?Us  — > cr(f,True)  and  cr(Eg\Jf,  False)  = 
Eg\Js&ccr(f,  False)  are  the  same  as  the  proofs  of  AgUTrue  and  AgVTrue  respec- 
tively, except  that  they  use  3 path  instead  of  Mpath. 

Proof  that  cr (AF  f,True)  = AF  s —>  cr(f,True)  yields  True. 

u |=  AF  False  — > cr(f,True)  = u [=  ATrueUFalse  — > cr(f,True) 

— True 

Proof  that  cr(AF  /,  False)  = AF  s&cr(f.  False)  yields  False. 

u |=  AF  Falsefocr(f,  False)  = u \=  ATrueUFalse&cr(f,  False) 

= False 

Proof  that  cr (EF  /, True)  = EF  s — > cr(f,True)  yields  True. 

u [=  EF  False  —>  cr(f,True)  = u (=  ETrueUFalse  — > cr(f,True) 

= True 

Proof  that  cr(EF  /,  False)  — EF  s&ccr(f,  False)  yields  False. 

u |=  EF  False$zcr(f,  False)  = u (=  ETrueUFalse$zcr(f,  False) 

= False 
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Proof  that  cr( AG  /, True)  = AG  s — > cr(f,True ) yields  True. 

u (=  AG  False  — > cr(/,  True)  — u \=  AG  True 

= u |=  ~EF  ~ True 
= u |=  ~EF  False 
= u (=  ~ETrue\JFalse 
= u |=  ~ False 
= u |=  True 
= True 


Proof  that  cr(AG  /,  False)  = AG  sSzcr(f,  False)  yields  False. 


u |=  AG  Falsek.cr{f , False)  = u |=  AG  False 

= u f=  ~EF  ~ False 
= u |=  ~EF  True 
= u (=  ~ETrueUTrue 
= u (=  ~True 
= u |=  False 
= False 


Proof  that  cr (EG  /,  True)  = EG  s — > cr(f,True)  yields  True. 


u [=  EG  False  — > cr(f,True)  — u (=  EG  True 
similar  to  AG  True  case 
= True 


Proof  that  cr(EG  /,  False)  — EG  s$zcr(f,  False)  yields  False. 


u |=  EG  False&cr(f,  False)  = u |=  EG  False 
similar  to  AG  False  case 
= False 


We  proved  that  expressions  whose  outermost  operators  are  temporal  opera- 
tors always  evaluate  to  the  desired  True  or  False  value  when  rewritten  according 
to  the  rules.  However,  specifications  may  be  some  boolean  function  of  tempo- 
ral operator  expression.  For  instance,  for  a specification  such  as  SPEC  ! AG  p 
to  evaluate  to  True,  the  temporal  operator  expression  must  evaluate  to  False. 
(Thus  the  need  for  rewriting  so  the  expression  will  be  True  or  it  will  be  False.) 
Assuming  subsequent  rewrites  make  the  subformulae  True  or  False , we  show 
that  these  rewrites  make  the  formula  True  or  False , as  appropriate. 
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Proof  that  cr(!  /,  v)  =!  cr(f,~v)  yields  True  or  False  appropriately. 

cr(\  f,True)  = ! cr(f,~True) 

= ! cr(/,  False ) 

= \False 
= True 

cr(!  /,  False)  = ! cr(f,  ~ False) 

= ! er(/,  True) 

= \True 
— False 


Proof  that  cr(f  & g,v)  = cr(/,  v)  & cr(g,  u)  yields  True  or  False  appropriately. 

cr(f&g,True)  — cr(f,True)  & cr(g,True) 

= T rue  & T rue 
= True 

cr(f  & g,  False)  = cr(/,  False)  & cr(<7,  False) 

— False  & False 
— False 


Proof  that  cr(f  \ g,v)  — cr(f,v)  \ cr(g,v)  yields  True  or  False  appropriately. 

cr(f\g,True)  = cr(f, True)  \ cr(g, True) 

= True  | True 
— True 

cr(f  | g , False)  = cr(f , False)  \ cr(g , False) 

— False  | False 
- False 


Proof  that  cr(/  — > = cr(f,~v)  — > cr(g,v)  yields  True  or  False  appro- 

priately. 

cr(/  — > g,True)  = cr(f,~True)  — > cr(g,True) 

— cr(f,  False)  —>  cr(g,True) 

= False  — > True 

= True 

cr(f  — > g,  False)  = cr(f,~  False)  — > cr(g,  False) 

- cr(f,True)  —>  cr(g,  False) 

— True  — > False 

= False 
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Proof  of  cr(f  <— > g,v)  = cr(f,~v)—>cr(g,v)  & cr(g,~v)—>  cr(f,v). 

cr(f  <—>  g,True)  = cr(f,~True)—>cr(g,True) 

& cr(g,~True)—>  cr(f,  True) 

= cr(f,  False)— > cr(g,  True)  & cr(g,  False)— >cr(f,  True) 
= False— > True  & False— >True 
= True  & True 
= True 

cr(f  <—>  g,  False)  = cr(f,~False)->cr(g,  False) 

&:  cr(g,  ~ False)— > cr(/,  False) 

= cr(/,  True)—>  cr(g,  False)  cr(g,True)—>cr(f , False) 
— True— > False  & True— > False 
= False  & False 
= False 


B Java  Virtual  Machine  Stack 


This  section  gives  the  model  and  specification  we  used  to  generate  tests.  We 
enumerate  the  stack  size.  Following  this  we  give  a second  SMV  file  which  models 
the  stack  size  as  a number. 

— $ Id:  i avaSt ack. smv,v  1.2  1999/08/05  13:50:20  black  Exp  $ 

— ^created  "Fri  Jun  26  11:20:23  1998"  *by  "Paul  E.  Black" 

— *modified  "Thu  Aug  5 09:46:25  1999"  *by  "Paul  E.  Black" 

— first  try  at  state  machine  abstraction  of 

Java  Smart  Card  virtual  machine 

— this  just  models  the  first  few  places  of  the  operand  stack 

MODULE  main 
VAR 

— system  inputs 

instr  : {in_pushl,  in_popl,  in_pop2}; 

— internal  states 
Sound  : boolean; 

StackSize  : {sizeO,  sizel,  size2,  size3,  sizeBig, 

s izeUndef ined} ; 

— SKIMP  stack  overflow  is  an  exception  which 
— is  not  caught  and  terminates  the  program. 


ASSIGN 

init (Sound)  :=  1;  — state  begins  sound 
next (Sound)  :=  case 

— abstraction  looses  accuracy 
StackSize=size3  & instr=in_pushl  : 0; 

1 : Sound;  — otherwise  soundness  is  unchanged 
esac; 

— allow  only  instructions  which  don’t  cause  stack  underflow 

— Java  compilers  should  ensure  this 
init (instr)  :=  in_pushl; 

next (instr)  :=  case 

next (StackSize)=sizeO  : in.pushl; 

next (StackSize)=sizel  : {in_pushl,  in_popl>; 
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1 : {in_pushl,  in_popl,  in_pop2}; 
esac; 

init (StackSize)  :=  sizeO;  — stack  begins  empty 
next (StackSize)  :=  case 

— push  one  item  on  the  stack 
StackSize=sizeO  & instr=in_pushl  : sizel; 

StackSize=sizel  & instr=in_pushl  : size2; 

StackSize=size2  & instr=in_pushl  : size3; 

StackSize=size3  & instr=in_pushl  : sizeBig; 

StackSize=sizeBig  & instr=in_pushl  : sizeBig; 

— pop  one  item  from  the  stack 
StackSize=sizel  & instr=in_popl  : sizeO; 

StackSize=size2  & instr=in_popl  : sizel; 

StackSize=size3  & instr=in_popl  : size2; 

— Size  after  popping  from  a "big"  stack  is  nondeterministic 

— since  we  lost  information. 

StackSize=sizeBig  & instr=in_popl  : {size3 , sizeBig} ; 

— pop  two  items  from  the  stack 
StackSize=size2  & instr=in_pop2  : sizeO; 

StackSize=size3  & instr=in_pop2  : sizel; 

— Size  after  popping  from  a "big"  stack  is  nondeterministic 

— since  we  lost  information. 

StackSize=sizeBig  & instr=in_pop2  : {size2, sizeBig} ; 

— anything  else  is  undefined 
1:  sizeUndef ined; 

esac ; 

— These  are  erroneous  in  JVM.  They  should  never  be  generated  by 
compilers  and  should  be  caught  by  the  verifier. 

TRANS 

StackSize=sizeO  ->  ! (instr=in_popl) 

TRANS 

StackSize=sizeO  ->  ! (instr=in_pop2) 

TRANS 

StackSize=sizel  ->  ! (instr=in_pop2) 


SPEC  AG (Sound  ->  (!  StackSize=sizeUndef ined) ) 

— push  one  item  on  the  stack 

SPEC  AG (Sound  -> (StackSize=sizeO  & instr=in_pushl  -> 

AX(Sound  ->(StackSize=sizel)))) 

SPEC  AG (Sound  -> (StackSize=sizel  & instr=in_pushl  -> 

AX (Sound  ->(StackSize=size2)))) 

SPEC  AG (Sound  ->(StackSize=size2  & instr=in_pushl  -> 

AX (Sound  ->(StackSize=size3) ) ) ) 

SPEC  AG (Sound  -> (StackSize=size3  & instr=in_pushl  -> 

AX (Sound  ->(StackSize=sizeBig)))) 
SPEC  AG (Sound  -> (StackSize=sizeBig  & instr=in_pushl  -> 

AX (Sound  ->(StackSize=sizeBig)))) 

— pop  one  item  from  the  stack 

SPEC  AG (Sound  -> (StackSize=sizel  & instr=in_popl  -> 

AX (Sound  ->(StackSize=sizeO)))) 

SPEC  AG (Sound  ->(StackSize=size2  & instr=in_popl  -> 

AX(Sound  ->(StackSize=sizel)))) 

SPEC  AG (Sound  -> (StackSize=size3  & instr=in_popl  -> 

AX (Sound  ->(StackSize=size2)))) 

SPEC  AG (Sound  ->(StackSize=sizeBig  & instr=in_popl  -> 

AX (Sound  ->(StackSize=size3  I StackSize=sizeBig) ) ) ) 

— pop  two  items  from  the  stack 

SPEC  AG (Sound  -> (StackSize=size2  & instr=in_pop2  -> 
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SPEC  AG (Sound  - 

SPEC  AG (Sound  - 

— pushl,  popl 
SPEC  AG (Sound  - 

SPEC  AG (Sound  - 

SPEC  AG (Sound  - 

SPEC  AG (Sound 


— pushl , pushl 
SPEC  AG (Sound  - 


SPEC  AG (Sound  - 


SPEC  AG (Sound  - 


SPEC  AG (Sound  - 


AX(Sound  -> (StackSize=sizeO) ) ) ) 

> (StackSize=size3  & instr=in_pop2  -> 

AX (Sound  ->(StackSize=sizel)))) 

> (StackSize=sizeBig  & instr=in_pop2  -> 

AX (Sound  ->(StackSize=size2  I StackSize=sizeBig) ) ) ) 
returns  stack  to  the  same  state 
> (StackSize=sizeO  & instr=in_pushl  -> 

AX(Sound  ->(instr=in_popl  -> 

AX (Sound  ->(StackSize=sizeO)))))) 

> (StackSize=sizel  & instr=in_pushl  -> 

AX(Sound  ->(instr=in_popl  -> 

AX (Sound  -> (StackSize=sizel) ) ) ) ) ) 

> (StackSize=size2  & instr=in_pushl  -> 

AX(Sound  ->(instr=in_popl  -> 

AX(Sound  ->(StackSize=size2)))))) 

> (StackSize=size3  & instr=in_pushl  -> 

AX (Sound  ->(instr=in_popl  -> 

AX (Sound  ->(StackSize=size3)))))) 

, pop2  returns  stack  to  the  same  state 
> (StackSize=sizeO  & instr=in_pushl  -> 

AX (Sound  -> (instr=in_pushl  -> 

AX (Sound  -> (instr=in_pop2  -> 

AX (Sound  -> (StackSize=sizeO) ))))))) 
>(StackSize=sizel  & instr=in_pushl  -> 

AX (Sound  -> (instr=in_pushl  -> 

AX (Sound  ->(instr=in_pop2  -> 

AX (Sound  -> (StackSize=sizel) ))))))) 
>(StackSize=size2  & instr=in_pushl  -> 

AX (Sound  ->(instr=in_pushl  -> 

AX(Sound  ->(instr=in_pop2  -> 

AX (Sound  -> (StackSize=size2) ))))))) 
>(StackSize=size3  & instr=in_pushl  -> 

AX(Sound  -> (instr=in_pushl  -> 

AX (Sound  ->(instr=in_pop2  -> 

AX (Sound  -> (StackSize=size3) ))))))) 


— $ Id:  javaStack2 . smv,v  1.3  1999/08/05  13:51:05  black  Exp  $ 

— * created  "Fri  Jun  26  11:20:23  1998"  *by  "Paul  E.  Black" 

— ^modified  "Thu  Aug  5 09:48:18  1999"  *by  "Paul  E.  Black" 

— first  try  at  state  machine  abstraction  of 

Java  Smart  Caxd  virtual  machine 

— this  just  models  the  first  few  places  of  the  operand  stack 

— This  version  models  the  stack  size  with  an  integer  subrange  so 

we  can  succinctly  model  next  size  (just  +1,  -1,  or  -2). 
However  we  must  save  previous  values  of  StackSize  since 
SPEC  clause  values  axe  no  longer  based  entirely  on  cases. 

MODULE  main 
VAR 

— system  inputs 

instr  : {in_pushl,  in_popl,  in_pop2}; 

— internal  states 
Sound  : boolean; 

StackSize  : 0..5;  — 4 is  Big,  5 is  undefined 
PStackSize  : 0..5; 

PPStackSize  : 0..5; 

PPPStackSize  : 0. .5; 

— SKIMP  stack  overflow  is  an  exception  which 
— is  not  caught  and  terminates  the  program. 
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DEFINE 

sizeBig  :=  4; 
sizeUndef ined  :=  5; 

ASSIGN 

init (Sound)  :=  1;  — state  begins  sound 
next (Sound)  :=  case 

— abstraction  looses  accuracy 
StackSize=3  & instr=in_pushl  : 0; 

1 : Sound;  — otherwise  soundness  is  unchanged 
esac; 

— allow  only  instructions  which  don’t  cause  stack  underflow 

— Java  compilers  should  ensure  this 
init(instr)  :=  in_pushl; 
next(instr)  :=  case 

next (StackSize)=0  : in_pushl; 

next (StackSize)=l  : {in_pushl,  in_popl>; 

1 : {in_pushl,  in_popl,  in_pop2>; 
esac; 

init(StackSize)  :=  0;  — stack  begins  empty 
next (StackSize)  :=  case 

— push  one  item  on  the  stack 

StackSize<sizeBig  & instr=in_pushl  : StackSize+1; 
StackSize=sizeBig  & instr=in_pushl  : sizeBig; 

— pop  one  item  from  the  stack 

StackSize>0  & StackSize<sizeBig  & instr=in_popl  : 

StackSize  - 1; 

— Size  after  popping  from  a "big"  stack  is  nondeterministic 

— since  we  lost  information. 

StackSize=sizeBig  & instr=in_popl  : {3, sizeBig}; 

— pop  two  items  from  the  stack 

StackSize>l  & StackSize<sizeBig  & instr=in_pop2  : 

StackSize  - 2; 

— Size  after  popping  from  a "big"  stack  is  nondeterministic 

— since  we  lost  information. 

StackSize=sizeBig  & instr=in_pop2  : {2, sizeBig}; 

— anything  else  is  undefined 
1:  sizeUndef ined; 

esac; 

— maintain  "Previous"  values  of  stack  size 
next (PStackSize)  :=  StackSize; 

next (PPStackSize)  :=  PStackSize; 
next (PPPStackSize)  :=  PPStackSize; 

— These  are  erroneous  in  JVM.  They  should  never  be  generated  by 

compilers  and  should  be  caught  by  the  verifier. 

TRANS 

StackSize=0  ->  ! (instr=in_popl) 

TRANS 

StackSize=0  ->  ! (instr=in_pop2) 

TRANS 

StackSize=l  ->  ! (instr=in_pop2) 

SPEC  AG (Sound  ->  (!  StackSize=sizeUndef ined) ) 

— push  one  item  on  the  stack 

SPEC  AG (Sound  -> (StackSize<sizeBig  & instr=in_pushl  -> 

AX (Sound  -> (StackSize=PStackSize+l) ) ) ) 
SPEC  AG (Sound  -> (StackSize=sizeBig  & instr=in_pushl  -> 

AX (Sound  ->(StackSize=sizeBig)))) 

— pop  one  item  from  the  stack 


24 


SPEC  AGCSound  -> (StackSize>0  & StackSize<sizeBig 

& instr=in_popl  -> 

AX (Sound  ->(StackSize=PStackSize  - 1)))) 
SPEC  AGCSound  ->(StackSize=sizeBig  & instr=in_popl  -> 

AX (Sound  ->(StackSize=3  I StackSize=sizeBig) ) ) ) 

— pop  two  items  from  the  stack 

SPEC  AGCSound  -> (StackSize>l  & StackSize<sizeBig 

& instr=in_pop2  -> 

AX (Sound  ->(StackSize=PStackSize  - 2)))) 
SPEC  AGCSound  -> (StackSize=sizeBig  & instr=in_pop2  -> 

AX(Sound  -> (StackSize=2  | StackSize=sizeBig) ) ) ) 

— pushl,  popl  returns  stack  to  the  same  state 
SPEC  AGCSound  -> (instr=in_pushl  -> 

AX(Sound  ->(instr=in_popl  -> 

AX (Sound  ->(StackSize=PPStackSize)))))) 

— pushl,  pushl,  pop2  returns  stack  to  the  same  state 
SPEC  AGCSound  -> (instr=in_pushl  -> 

AXCSound  ->(instr=in_pushl  -> 

AXCSound  ->(instr=in_pop2  -> 

AXCSound  -> (StackSize=PPPStackSize) ))))))) 


C JVM  Stack  Tests 

Table  1 shows  the  nine  tests  which  resulted  from  applying  our  test  generation 
method.  We  only  show  the  instruction  for  each  step:  the  stack  size  is  easily 
determined.  In  the  model,  the  type  of  instruction  in  a step  changes  the  stack 
size  in  the  next  step.  Thus  the  final  step  of  each  test  checks  the  stack  size,  and 
the  choice  of  final  instruction  is  irrelevant.  The  SVM  model  checker  [19]  chose 
pushl  as  the  last  instruction  of  each  test;  we  leave  out  that  last  instruction  from 
this  table. 


Test 

Instruction  types 

1 

pushl  popl 

2 

pushl  pushl  popl  pushl  pop2 

3 

pushl  pushl  popl  popl 

4 

pushl  pushl  pop2 

5 

pushl  pushl  pushl  popl  pop2 

6 

pushl  pushl  pushl  popl  pushl  pop2 

7 

pushl  pushl  pushl  popl  popl 

8 

pushl  pushl  pushl  pop2  popl 

9 

push!  pushl  pushl  pop2  pushl  pop2 

Table  1.  Generated  Stack  Tests 


Although  there  are  similarities  between  tests,  we  need  all  nine  tests  for  100% 
mutation  coverage. 
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